NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms.

Qiu, H; Mao, W; Patke, A; Cui, S; Wang, C; Franke, H; Kalbarczyk, Z; Başar, T; Iyer, R (September 2024, MLSys)
Gibbons, PhillipB; Pekhimenko, Gennady; De_Sa, Christopher (Ed.)
The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.
more » « less
Full Text Available
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms.

Qiu, H; Mao, W; Patke, A; Cui, S; Wang, C; Franke, H; Kalbarczyk, Z; Başar, T; Iyer, R (September 2024, MLSys)
Gibbons, Phillip B; Gennady, P; De_Sa, Christopher (Ed.)
The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.
more » « less
Full Text Available
Power-aware Deep Learning Model Serving with µ-Serve. In Proceedings of the 2024 USENIX Annual Technical Conference (ATC 2024).

Qiu, H; Mao, W; Patke, A; Cui, S; Jha, S; Wang, C; Franke, H; Kalbarczyk, Z; Basar, T; Iyer, R (September 2024, Usenix_Atc_24)
Begnum, Kyrre; Border, Charles (Ed.)
With the increasing popularity of large deep learning model serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.
more » « less
Full Text Available
Power-aware Deep Learning Model Serving with µ-Serve

Qiu, H; Mao, W; Patke, A; Cui, S; Jha, S; Wang, C; Franke, H; Kalbarczyk, Z; Basar, T; Iyer, R (September 2024, Usenix_Atc_24)
Begnum, Kyrre; Border, Charles (Ed.)
With the increasing popularity of large deep learning model-serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine-grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model-serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.
more » « less
Full Text Available
Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity

Mao, W; Qiu, H; Wang, C; Franke, H; Kalbarczyk, Z; Iyer, R; Basar, T (April 2024, NeurIPS)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings.
more » « less
Full Text Available
Nano-infrared imaging of epitaxial graphene on SiC revealing doping and thickness inhomogeneities

https://doi.org/10.1063/5.0189724

Fralaide, M.; Chi, Y.; Iyer, R. B.; Luan, Y.; Chen, S.; Shinar, R.; Shinar, J.; Kolmer, M.; Tringides, M. C.; Fei, Z. (March 2024, Applied Physics Letters)

We report on a nano-infrared (IR) imaging and spectroscopy study of epitaxial graphene on silicon carbide (SiC) by using scattering-type scanning near-field optical microscopy (s-SNOM). With nano-IR imaging, we reveal in real space microscopic domains with distinct IR contrasts. By analyzing the nano-IR, atomic force microscopy, and scanning tunneling microscopy imaging data, we conclude that the imaged domains correspond to single-layer graphene, bilayer graphene (BLG), and higher-doped BLG. With nano-IR spectroscopy, we find that graphene can screen the SiC phonon resonance, and the screening is stronger at more conductive sample regions. Our work offers insights into the rich surface properties of epitaxial graphene and demonstrates s-SNOM as an efficient and effective tool in characterizing graphene and possibly other two-dimensional materials.
more » « less
Full Text Available
ML-driven Malware that Targets AV Safety

Jha, S.; Cui, S.; Banerjee, S.; Tsai, T.; Kalbarczyk, Z.; Iyer, R. (June 2020, International Conference on Dependable Systems and Networks)

Full Text Available
ML-driven Malware that Targets AV Safety

https://doi.org/10.1109/DSN48063.2020.00030

Jha, S.; Cui, S.; Banerjee, S.; Tsai, T.; Kalbarczyk, Z.; Iyer, R. (June 2020, International Conference on Dependable Systems and Networks)

Full Text Available
Smart Malware using Leaked Control Data of Robotic Applications: The Case of Raven-II Surgical Robots.

Chung, K; Li, X.; Tang, P.; Zhu, Z.; Kalbarczyk, Z.; Iyer, R.; Kesavadas, T. (September 2019, RAID)

Full Text Available

Search for: All records